Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decrease interface for aggregation #9737

Merged
merged 50 commits into from
Jan 17, 2025

Conversation

xzhangxian1008
Copy link
Contributor

@xzhangxian1008 xzhangxian1008 commented Dec 20, 2024

What problem does this PR solve?

Issue Number: ref #7376

Problem Summary:

What is changed and how it works?

In order to support aggregation in window function, we need to add some interfaces for aggregation. In this pr, we add `decrease` interface for aggregation.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 20, 2024
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 10, 2025
@@ -66,7 +66,7 @@ class AggregateFunctionForEach final
AggregateFunctionForEachData & ensureAggregateData(
AggregateDataPtr __restrict place,
size_t new_size,
Arena & arena) const
Arena * arena) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no need to change it now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no need to change it now?

revoked

namespace DB
{
template <typename T>
struct SingleValueDataFixedForWindow : public SingleValueDataFixed<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this struct need inherit SingleValueDataFixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SingleValueDataFixedForWindow can not just inherit the ser/deser functions from SingleValueDataFixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this struct need inherit SingleValueDataFixed

we now delete the inheritence.

using Self = SingleValueDataStringForWindow;

// TODO use std::string is inefficient
mutable std::multiset<std::string> saved_values;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is collator take affect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is collator take affect?

Collator comparison is added now.

if constexpr (is_add)
{
this->addCounter(place);
this->setFlag(place);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why do we need to inherit AggregateFunctionNullBase


if (result_is_nullable && need_counter)
{
auto tmp = prefix_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we still need null flag

saved_values.erase(iter);
}

void changeIfLess(const IColumn & column, size_t row_num, Arena *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can define an add method, both changeIfLess and changeIfGreater can call add

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can define an add method, both changeIfLess and changeIfGreater can call add

done

if constexpr (is_min)
iter = saved_values.begin();
else
iter = std::prev(saved_values.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use iter = saved_values.rbegin()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use iter = saved_values.rbegin()?

done

{}

// TODO use std::string is inefficient
std::string value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use StringRef or std::string_view

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use StringRef or std::string_view

Using StringRef now.

@@ -91,13 +90,23 @@ class AggregateFunctionNullBase : public IAggregateFunctionHelper<Derived>
}

public:
explicit AggregateFunctionNullBase(AggregateFunctionPtr nested_function_)
explicit AggregateFunctionNullBase(AggregateFunctionPtr nested_function_, bool need_counter = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need_counter is still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need_counter is still needed?

deleted

, collator(collator_)
{}

// TODO use StringRef is inefficient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the todo mean, is there any more efficient implement here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the todo mean, is there any more efficient implement here?

It can be deleted now.

mutable multiset saved_values;
TiDB::TiDBCollatorPtr collator{};

void saveValue(StringRef value) { saved_values.insert(StringWithCollator(value, collator)); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void saveValue(StringRef value) { saved_values.insert(StringWithCollator(value, collator)); }
void saveValue(StringRef & value) { saved_values.insert(StringWithCollator(value, collator)); }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


switch (prefix_size)
{
case 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use an array like [sizeof(uint64), sizeof(uint64), sizeof(uint64), 9, sizeof(uint64), 10, 12, 14], so the result_prefix_size = array[prefix_size], or use a static array so it can be init at runtime without assuming sizeof(uint64) == 8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use an array like [sizeof(uint64), sizeof(uint64), sizeof(uint64), 9, sizeof(uint64), 10, 12, 14], so the result_prefix_size = array[prefix_size], or use a static array so it can be init at runtime without assuming sizeof(uint64) == 8

done

@@ -505,4 +504,197 @@ class AggregateFunctionNullVariadic final
std::array<char, MAX_ARGS> is_nullable; /// Plain array is better than std::vector due to one indirection less.
};

// Make the prefix_size >= sizeof(UInt64) and could fit the align size
inline size_t enlarge_prefix_size(size_t prefix_size) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like as the function name, it should be enlargePrefixSize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like as the function name, it should be enlargePrefixSize

done

fmt::format("Cannot allocate memory (posix_memalign), size: {}, alignment: {}.", size, alignment),
ErrorCodes::CANNOT_ALLOCATE_MEMORY,
res);
buf = new_buf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buf is always nullptr when this function is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buf is always nullptr when this function is called?

yes

public:
AlignedBuffer() = default;
AlignedBuffer(size_t size, size_t alignment);
AlignedBuffer(AlignedBuffer && old) noexcept { std::swap(buf, old.buf); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems useless, because there is no guarantee that the alignment is the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems useless, because there is no guarantee that the alignment is the same?

deleted

{
assert(prefix_size != 0);
static_assert((sizeof(prefix_size_look_up_table) / sizeof(UInt64)) == 7);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asdd another static_assert to make sure sizeof(UInt64) == 8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asdd another static_assert to make sure sizeof(UInt64) == 8?

done

Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 17, 2025
Copy link
Contributor

ti-chi-bot bot commented Jan 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: guo-shaoge, windtalker

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [guo-shaoge,windtalker]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented Jan 17, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-01-10 05:51:18.42432135 +0000 UTC m=+505621.713153055: ☑️ agreed by guo-shaoge.
  • 2025-01-17 02:16:59.579515284 +0000 UTC m=+257691.034561430: ☑️ agreed by windtalker.

@ti-chi-bot ti-chi-bot bot merged commit 8dade17 into pingcap:master Jan 17, 2025
5 checks passed
@xzhangxian1008 xzhangxian1008 deleted the decrease-agg branch January 17, 2025 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants